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Abstract — NASA’s Ice, Cloud and Land Elevation Satellite-II 
(ICESat-2) mission is a decadal survey mission (2016 launch). 
The mission objectives are to measure land ice elevation, sea 
ice freeboard, and changes in these variables, as well as to 
collect measurements over vegetation to facilitate canopy height 
determination. Two innovative components will characterize the 
ICESat-2 lidar: 1) collection of elevation data by a multibeam 
system and 2) application of micropulse lidar (photon-counting) 
technology. A photon- counting altimeter yields clouds of discrete 
points, resulting from returns of individual photons, and hence 
new data analysis techniques are required for elevation deter- 
mination and association of the returned points to reflectors of 
interest. The objective of this paper is to derive an algorithm that 
allows detection of ground under dense canopy and identification 
of ground and canopy levels in simulated ICESat-2 data, based 
on airborne observations with a Sigma Space micropulse lidar. 
The mathematical algorithm uses spatial statistical and discrete 
mathematical concepts, including radial basis functions, density 
measures, geometrical anisotropy, eigenvectors, and geostatistical 
classification parameters and hyperparameters. Validation shows 
that ground and canopy elevation, and hence canopy height, can 
be expected to be observable with high accuracy by ICESat-2 
for all expected beam energies considered for instrument design 
(93.01% -99.57% correctly selected points for a beam with 
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expected return of 0.93 mean signals per shot (msp), and 
72.85% -98.68% for 0.48 msp). The algorithm derived here is 
generally applicable for elevation determination from photon- 
counting lidar altimeter data collected over forested areas, land 
ice, sea ice, and land surfaces, as well as for cloud detection. 

Index Terms — Algorithms, altimetry, laser measurements, 
satellites. 

I. Introduction 

D ETERMINATION of vegetation height of earth’s forests 
is an essential requirement in the estimation of global 
and regional biomass and carbon levels. Because of the scale 
of the problem and the inaccessibility of many of earth’s 
forested areas, this is best achieved by satellites. NASA’s 
Ice, Cloud and Land Elevation Satellite (ICESat) mission 
(2003-2009) has resulted in important new findings in ecology 
[19], [30]— [33], [35]— [37], [40], [42], [43], in addition to many 
results in the primary mission objectives in the cryospheric sci- 
ences (e.g., [5], [7], [15], [16], [25], [27]-[29], [34], [44]-[46], 
[49]-[52], see also http://ICESat.gsfc.nasa.gov/publications). 
ICESat ceased operation in 2009. The National Research 
Council’s “Decadal Survey” [38] identified ICESat-2 as one 
of its first-tier missions citing the urgent need to observe the 
rapidly changing cryosphere [39], [47], with launch currently 
planned for 2016 [1], [2]. 

Laser altimetry is suited to observe vegetation height and 
structure because the returned signals include return from the 
top of the canopy, from within the canopy, and from the 
ground. Therefore the ICESat-2 mission has an ecosystem 
science requirement, stated as, “ICESat-2 shall produce ele- 
vation measurements that enable independent determination 
of global vegetation height with a ground track spacing of 
less than 2 km everywhere over a two-year period.” Based on 
results from the ICESat mission, which included canopy height 
estimates with root-mean- square errors of 2-6 m [32], [37], 
[40], [43], it is expected that the ICESat-2 mission, operating 
continuously in a 91 -day exact repeat orbit, will facilitate 
derivation of a vegetation height product at 1-km spatial 
resolution if off-nadir pointing is used to increase the spatial 
distribution of observations over terrestrial regions. There are, 
however, different requirements in orbit design and sampling 
for vegetation science and for the ICESat-2 mission’s primary 
cryospheric objectives [2], [14], [17], [26]. Hence a different 


0196-2892/$31.00 © 2013 IEEE 


This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 


2 


IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 


Decadal Survey Mission, Deformation Ecosystem Structure 
and Dynamics of Ice (DESDynl), was planned to include a 
lidar specifically designed to measure vegetation height. The 
importance of the ICESat-2 project in estimation of biomass 
and carbon levels has increased substantially following the 
recent cancellation of the lidar component of the DESDynl 
mission. 

Determination of vegetation height from ICESat-2 mea- 
surements will be based on the determination of canopy 
and ground elevations. This is not trivial, because ICESat-2 
will use a fundamentally different lidar than was used on 
ICESat, and identification of ground and canopy in the result- 
ing data requires development of new mathematical methods 
and algorithms. Two innovative components will character- 
ize the ICESat-2 lidar compared to the ICESat mission: 

1) collection of elevation data by a multibeam system and 

2) application of micropulse lidar (photon-counting) tech- 
nology [2]. In comparison with the classic pulse-limited 
altimeter, a micropulse photon-counting altimeter yields clouds 
of discrete points, which result from returns of individual 
photons, and hence new data analysis techniques are required 
for elevation determination and association of returned points 
to reflectors of interest including land and sea ice surfaces, 
ground, tree canopy, water, clouds, and blowing snow. 

Identification of tree canopy is especially challenging 
because of the fuzzy margin of a tree crown, and detection 
of ground under a possibly dense canopy is difficult because 
only a small percentage of the originally transmitted photons 
penetrate the atmosphere and the tree cover, get reflected 
from the ground, and, after reflection, penetrate tree cover 
and atmosphere again, before reaching the receiver aboard 
the satellite. Therefore, the vegetation algorithm development 
poses a mathematically more difficult problem than the ice 
algorithm design. Furthermore, a lower number of signal 
photons may be expected to be received from vegetation, 
because reflectance of ice surfaces is much higher than that 
of tree crowns and ground. In addition, forested areas tend to 
occur in regions with more humid atmospheres and/or higher 
aerosol densities than glaciers and ice sheets, which tends to 
further reduce the number of received signal photons. 

Similar technologies to those proposed for ICESat-2 have 
been applied in airborne lidars. ICESat-2 will use a lidar 
with a center wavelength 532 nm, while ICESat used two 
frequencies (532 and 1064 nm) facilitated by a frequency- 
doubling crystal [44] . Micropulse photon-counting technology 
has been developed and described by [9]— [13], and data from a 
Sigma Space Corporation airborne sensor that implements this 
forms the basis of the algorithm development and analysis in 
this paper. Micropulse single-photon ranging with a multibeam 
push-broom configuration operating at 532 and 1064 nm has 
been used by the Slope Imaging Multi-polarization Photon- 
counting Lidar (SIMPL) altimeter [8], [18]. 

Spacebome multibeam lidar technology is being used for 
topography observations of the moon and Mercury. The lunar 
orbiter laser altimeter (LOLA) of the Lunar Reconnaissance 
Orbiter (LRO) (http://lunar.gsfc.nasa.gov/lola/about.html, 
launch date June 18, 2009) uses a multibeam laser 
generated by propagating a single laser pulse through a 


diffractive optical element that splits it into five beams 
and analysis of the returned pulse. The mercury laser 
altimeter (MLA) of the MESSENGER mission to Mercury 
is a pulse-limited 1064-nm four-beam laser, operated at 
5 Hz (http://nssdc.gsfc.nasa.gov/nmc/experimentDisplay.do? 
id=2004-030A-05, launch date August 3, 2004). Both these 
missions employ analysis of laser pulses, whereas ICESat-2 
will base altimeter measurements on analysis of ranging to 
individual photons, which necessitates development of new 
algorithms for elevation determination. Such an approach is 
described here. 

The objective of this paper is to derive and describe a math- 
ematical algorithm that allows detection of ground and canopy 
in micropulse photon-counting lidar data, of characteristics 
similar to those that will be expected from ICESat-2, and 
to apply these to forest data. So that the most challenging 
cases can be solved, data were taken from the Smithsonian 
environmental research center (SERC) forest, which has a 
dense canopy. In Section II, the simulated data and ICESat-2 
instrument design cases for vegetated areas are described to 
aid the reader in understanding the data structure and hence the 
possibilities for generalizations of the method to similar and 
other lidar data. The simulation method is not the topic of this 
paper. In Section III, the mathematical concepts applied in the 
algorithm are defined; Section IV then provides a description 
of the numerical algorithm. Section III provides the mathe- 
matical fundamentals and the motivation for their application. 
This section also serves as a reference for applications to 
other similar data analysis tasks. Section IV can be used by 
a programmer as a step-by-step guide for implementation of 
the algorithm and generation of a software for analysis of 
photon-counting laser altimeter data over forested areas. These 
sections may serve as a basis for an algorithm theoretical basis 
document for ICESat-2, but are written with a broad audience 
in mind. In Section V, the algorithm is applied to analyze 
simulated data, based on observations over a dense broad-leaf 
forest, and Section VI is dedicated to algorithm validation. 
Section VII summarizes, discusses, and concludes this paper. 

II. ICESat-2 Instrument Design Cases 
and Data Description 

A. Micropulse Photon- Counting Lidar Data 

The sensor used in the ICESat mission, namely, the Geo- 
science Laser Altimeter System (GLAS) [44], was a pulse- 
limited laser altimeter. Elevation determination is based on 
the analysis of waveforms fitted to the returned signal, the 
peak of the waveform is associated with geolocation of the 
“point” (footprint center) from which the signal is returned, 
and elevation is derived from the two-way travel time asso- 
ciated with the waveform peak. Micropulse photon-counting 
technology, as pioneered by [9]— [13], is realized in an air- 
borne system built by Sigma Space Corporation (and in other 
instruments). The Sigma Space system operates at 532 nm 
wavelength, and the same wavelength will be used for the 
Advanced Topographic Laser Altimeter System (ATLAS) that 
is under development for ICESat-2. In this paper, simulations 
of ICESat-2 ATLAS data based on lidar data collected with the 
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Sigma Space system will be used for algorithm development. 
The Sigma Space system yields data in a 3-D point cloud, 
resulting from 100 beams. The ATLAS system will have 
discrete sets of individual beams of different strengths. This 
requires a reduction, resampling, and simulation to generate 
the expected ATLAS -type data from the collected Sigma Space 
data. The received ATLAS -type data for a single beam are 
simulated as summarized in Section II-D; the output is given 
as a 2-D projection of the cloud of single-photon reflections 
versus along-track distance of a ground track. Pseudo-ranges 
are converted to pseudo-elevations in meters. Use of a 2-D 
projection in the simulated data is appropriate because the 
ATLAS lidar will assign all received noise and signal photon 
counts to the central beam axis. We analyze simulated data 
from beams of different strengths. The multibeam capacity of 
ICESat-2 is not analyzed here. 

B. Design Cases for a Multibeam Sensor for ICESat-2 

Designs of a multibeam system discussed for ICESat-2 
include a combination of stronger and weaker beams. Science 
requirements in ice observation have led to the observation 
requirement of a multibeam system, while energy constraints 
limit the number of equally strong beams to about 2-4. On 
one hand, beams with higher transmit energy have a higher 
probability of penetrating clouds and dense atmosphere and 
hence yield sufficiently many surface returns for surface identi- 
fication. The number of surface photons is constrained by both 
environmental factors (such as atmospheric conditions and 
surface reflectance) and instrument factors (such as transmit 
energy or detector efficiency). A lidar system only penetrates 
thin clouds, but clouds prevail in the Arctic 70%-80% of the 
time. On the other hand, a larger number of beams are needed 
to observe the spatial variability of the ice surface, which 
provides characteristics indicative of ice types, ice dynamics, 
morphogenesis of sea ice, and other parameters of interest, 
and improves accuracy of ice elevation mapping and change 
detection (see [1], [20] and other work cited therein [2]). 
The combination of these constraints suggests a design that 
includes beams of different strengths. The two favorites at 
times of this research were a nine-beam design (with beam 
strengths 1-2-1; 2-4-2; 1-2-1; i.e., center beam in each row 
twice as strong as outer beams, yielding corner beams a 
quarter of the strength of the center beam) and a six-beam 
design (with strengths 1-4; 1-4; 1-4; i.e., a weak beam and a 
strong beam in each row). In this paper, we use predictions of 
ATLAS radiometry over ground and tree canopy to generate 
simulated datasets from the Sigma Space lidar data collected 
over vegetated areas. These simulated datasets are used to 
develop an algorithm to derive both the ground elevation and 
canopy height. 

C. Characteristics of the Forest Type 

The dense forests of the SERC, located in eastern North 
America (38.889° N, 79.559° W), were selected as study 
sites, because the SERC forests are well characterized through 
previous work and the relatively dense canopy represents a 
challenge for ground detection. Hence an algorithm that works 


for ground detection in the SERC forests is likely to work 
for less dense forests. Airborne Sigma Space lidar data were 
collected there during leaf-on conditions in early October 
2009. 

The SERC forest contains 3350 trees of 84 recognized 
species on 16 ha and is situated adjacent to a subestuary 
of the Chesapeake Bay on the coastal plain near Edgewater, 
Maryland. The square 16-ha plot is dominated by mature 
secondary upland forest but is bisected with a section of 
floodplain forests, both around 120 years since initiation. The 
upland forest is an example of the “tulip poplar” association 
with an overstory dominated by tulip poplar (Liriodendron 
tulipifera), several oaks (Quercus spp.), beech (Fagus gran- 
difolia), and several hickories (Cary a spp.); a mid-canopy 
of red maple (Acer rubrum) and sour gum (Nyssa sylvat- 
ica); and an understory composed of American hornbeam 
(Carpinus caroliniana), spicebush (Lindera benzoin), and paw- 
paw (Asimina triloba). The flood plain forest is dominated 
by ashes (Fraxinus spp.), sycamore (Platanus occidentalis), 
and American elm (Ulmus Americana). Installation of the 
plot began in September 2007. The forest is rather tall (as 
high as 40 m) and has a high richness for this part of the 
temperate zone, with more than 34 species of at least 20.0 
cm diameter at breast height (DBH; it is a standard method 
of expressing the diameter of the trunk or bole of a standing 
tree. DBH is one of the most common dendrometric mea- 
surements). As of November 2009, the tagging and censusing 
of all woody stems 31 cm DBH in about 9.0 ha of the plot 
have been completed [41]. At time of the survey with the 
Sigma Space photon-counting sensor in October 2009, the 
SERC forest had reached a mature state with a closed canopy 
cover (over 95% canopy closure) and leaves were still on 
(Geoffrey Parker, see http://www.ctfs. si. edu/site/SERC%3A+ 
Smithsonian H-Environmental+Research+Center; 2- 1 0-20 1 2) . 

D. Simulated ICESat-2 Data Based on Airborne Sigma Space 
Corporation Data: File Name Conventions 

As stated above, our simulation of ICESat-2 data used 
photon-counting data over the SERC forests collected by a 
commercial lidar developed by Sigma Space Corporation. The 
Sigma Space lidar is a high-repetition-rate (kHz) 100-beam 
scanning photon-counting lidar. The ICESat-2 is expected to 
have six beams, at fixed angles, and will also use a high 
repetition rate (10 kHz). Given these differences, the photon 
cloud collected by the Sigma lidar spans roughly 800 m at 
the flight altitude used here in the across-track direction and 
has substantially higher signal photon density than ICESat-2 is 
expected to achieve. Therefore, the Sigma data must be down- 
sampled both geometrically and radiometrically to simulate 
ICESat-2. Observation flights were conducted at dusk, which 
results in reduction of background photons from sunlight, 
compared to mid-day ambient light. Elimination of many (but 
not all) background photons above canopy and below ground 
was performed by manually applying a prescribed spatially 
variable range window that includes trees and ground surface. 
The resulting dataset is referred to as a database of signal-only 
photons here, though we are unable to perfectly discriminate 


This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 


IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 


4 

between signal and noise photons. Our simulation generates 
ICESat-2 simulated datasets that vary with respect to noise 
levels, radius of photon capture, resampling, laser intensity 
and expected number of mean photons per shot, subarea/flight 
track, and several realizations of random processes [3], [4]. 

The Sigma data were scaled to mimic the expected spatial 
distribution of ICESat-2 data. The combination of the nominal 
10-kHz repetition rate of the ICESat-2 laser with a nominal 
7 km/s ground speed of the spacecraft yields a footprint on 
the ground every 70 cm along the flight direction of ICESat- 
2. Therefore, in order to scale the Sigma data, we selected 
straight-line segments along the aircraft ground track and 
defined footprint locations every 70 cm by interpolating along 
the aircraft track. The footprint of ICESat-2 is expected to 
be circular, nominally 10 m in diameter, with a Gaussian 
energy distribution within the footprint. For a given ICESat-2 
footprint, the location of the footprint center will be known, but 
the point of origin of any recorded photon within a footprint 
will not be known. As such, all received photons are effectively 
collapsed in space to the footprint center. For the Sigma 
data, individual photon locations are provided; these locations 
are therefore collapsed to the defined footprint center in our 
ICESat-2 simulated data. 

For each ICESat-2 simulated footprint, the simulation has 
three main steps. First, the expected number of returned pho- 
tons in the footprint is generated using a Poisson-distributed 
random number with a mean equal to the expected number of 
signal photons per shot (msp) based on the ATLAS radiom- 
etry model for vegetation and the ground under vegetation. 
([3], see also A. Martino, AtlasPerformance20100421.xls on 
the ICESat-2 website). Because of the canopy closure, reflec- 
tivities of the ground surface and vegetation are imperfectly 
known and the atmospheric conditions are not known, so the 
number of signal photons in a given footprint is determined 
in large part by ground observation and published values for 
ground and vegetation reflectivity. As such, the simulation 
should be considered notional rather than a precise simulation 
of the SERC forest. 

Second, the locations of these signal photons are chosen 
from the Sigma Space data using a Gaussian- weighted random 
distribution with a 2-sigma diameter of 10 m to select a 
radial distance from the footprint center and a uniform random 
distribution to select an angle with respect to aircraft ground 
track. This last step allows us to randomly sample the ground 
within the 10-m swath of ICESat-2 footprints and to simulate 
the jitter of footprint locations. 

Third, since the Sigma Space dataset has a lot more photons 
than we expect from ICESat-2, we need a metric to determine 
which Sigma Space photon to select to represent the photon 
location determined in step 2. We select the closest Sigma 
Space photon to the photon location from step 2. The region 
of photon selection is limited to a 1-m radius circle around 
the desired photon location from step 2, and extends vertically 
through the Sigma Space data cloud (termed “1-m cap size”). 
If no photons are found within this cylinder, then none is 
selected for the simulation. Photon selection is limited to 
within the cap size for three reasons: 1) to avoid selecting 
photons that are too far away from the desired location to be 


selected by the ICESat-2 instrument. For example, if step 2 
yielded a photon location near the edge of the 10-m diameter 
footprint, and we used a 5-m cap size, a photon 15 m from the 
footprint center could be chosen; 2) to minimize computer time 
required to select photons. Given the Sigma Space data density, 
a large cap size would greatly increase the computational 
effort. For each photon location selected in step 2, a 10-m 
cap size would not only exacerbate the problem noted above 
but also mean searching through thousands of Sigma Space 
photons to find the one closest to the desired location; and 
3) control the average number of signal photons in a given 
simulation. A smaller cap size is used to simulate ICESat-2 
data from source datasets with very large photon density 
compared to a larger cap size for datasets with relatively sparse 
photon data. If a 10-m cap size was used to select photons 
from the Sigma data, the resulting data product would have 
far more signal photons than can be expected from ICESat-2; 
this is primarily a consequence of the number of beams (100) 
in the Sigma lidar, the fast pulse repetition rate, and the 
relatively slow aircraft ground speed. The radius of photon 
capture is a cut-off value for the entire simulated dataset. 

Step 3 is repeated for each footprint location (from step 2) 
until the number of desired photons (from step 1) is reached. 
A Sigma Space photon may be used in one or more simulated 
footprints owing to the overlapping nature of the ICESat-2 
footprints. We tuned the simulation parameter “capsize” so 
that the average number of signal photons selected over the 
entire length of the simulated dataset was approximately the 
same as the average number of signal photons predicted by 
the ATLAS radiometry model. 

Because ICESat-2 assigns all photon locations to the foot- 
print center, a projection of the three-dimensional point loca- 
tions to 2-D point locations (xt,Zi) is applied, with x / distance 
in meters along a ground track, Zi pseudo-elevation in meters, 
and i = 1 , ,n,n e AT the number of points in the simulated 
dataset. The total distance along track for the Sigma datasets 
from SERC is 2500 m. Noise is added in a subsequent step, 
with noise levels as described in Section II-C. 

In the following description, file-name extensions are given 
in brackets in the order in which they occur in the file names. 

Radius of photon capture (capl), defined as described in 
step 3 above, is 1 m in all datasets. 

Resampling [rO (no resampling), rl (resampling)] indicates 
the photon reuse flag, rO indicates that no photon is used more 
than once, and r 1 indicates that a photon can be selected for 
any footprint even if it was selected for a previous footprint. 
Effectively, rO results in fewer recorded photons per shot 
than rl. 

Laser intensity , quantified as “msp”, is the mean signal 
photons expected per shot (p4: 0.48 msp, p9: 0.96 msp). Laser 
intensity is used to characterize the different strengths of the 
beams considered for the ICESat-2 instrument. Intensity is 
quantified by a floating point number indicating the mean 
number of signal photons desired per footprint (per shot). 
The number of photons selected for any footprint is calculated 
using a Poisson-distributed random function with this mean. 
Note that this paper analyzes the two cases of the weak beam 
and the medium-strong beam of the nine-beam design, as 
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these are the cases limiting instrument design; there is also the 
case of the strong beam (1.93 msp), for which no simulations 
are included in the 99 ICESat-2 type SERC datasets (version 
2010 ). 

Subarea/ flight track (sbO-1, sb0-3, sb0-5). In the airborne 
experiment conducted by Sigma Space Corporation over the 
SERC forests, data along five tracks were collected and three 
of those were used to create the simulated datasets. 

Randomization instance (si, s2, s3, s4) refers to a new run 
of the simulation with a different random seed. Randomization 
allows us to generate several different simulations for the same 
ground track that will select different photons. (A random 
seed is a large integer number that is selected to start a 
new simulation, mathematically a new realization of a random 
function. This is commonplace in probability theory. The two 
realizations only differ in the random part, while all other 
constraints of the simulation — here flight track, noise level, 
resampling, laser intensity — remain the same.) 

Noise levels [uz2 (lowest), uz3 (middle), uz5 (highest)]. Ran- 
dom noise is added in the simulations to mimic different 
atmospheric conditions, typical of nighttime conditions (uz2, 
0.5 MHz), clear daytime conditions, as encountered on a crisp 
winter day (uz3, 2 MHz), and hazy daytime conditions, as 
encountered on a humid summer day (uz5, 5 MHz) (pers. 
comm. S. Palm, see also [48]). The existence of solar back- 
ground noise and atmospheric scattering provides one of the 
main challenges in detection of returns from vegetation and 
ground under canopy. 

III. Mathematical Concepts of the Algorithm 

Problems that must be addressed in the determination of 
canopy height from photon-counting lidar data include the 
following: fuzzyness of tree crowns; poor signal-to-noise ratios 
in many observational cases; roughness of the ground; trends 
in slope of the ground over larger distances; and specific 
density of trees per unit area which varies with forest type. 
An approach that mimics the statistical analysis of data 
collected with a pulse-limited laser, such as the Geoscience 
laser altimeter system (GLAS) aboard ICESat, and proceeds 
by matching along-track histograms to Gaussian functions to 
identify the return will not generally allow the identification 
of ground and canopy. The photon-counting data point clouds 
often yield histograms with multiple maxima and minima, 
which makes identification of a reflector difficult or result 
in situations where ground is not visible in the histogram 
at all. To overcome these problems, we develop a different 
mathematical approach. 

A challenge lies in the implementation of an algorithm that 
facilitates automation of a “soft” solution that selects those 
regions as canopy and ground that visually appear as such 
in the cases of the stronger beams or less noise, and, in 
addition, succeeds at ground and canopy detection even in 
those cases that cannot be interpreted visually any more. This 
will be achieved by a combination of a density quantification 
that uses radial basis functions [see Section (M.2)], and a 
generalization of the so-called hyperparameter concept of the 
geostatistical classification method, adapted and applied to 


histograms of the density function results. The mathematical 
approach described here employs spatial statistical methods 
and discrete mathematics building on concepts similar to those 
developed by the first author for other signal processing and 
spatial classification problems [21], [22], [24] and developing 
new concepts where needed. 

Radial basis functions are used as a means to identify cen- 
ters of dense point clouds over background noise (M.2, M.3). 
Noting that canopy has a tendency to extend horizontally more 
than vertically motivates introduction of an anisotropy matrix 
(M.4) in the identification of density centers (M.3). Ground 
is in principle a simply connected feature (in the sense of 
1 -connectedness in mathematical topology), but may appear 
as disconnected segments of denser areas in the photon data; 
this is addressed by an anisotropic search as well. Further 
analysis of density centers and their histograms will lead to the 
discovery that a method for selecting the most relevant maxima 
among other maxima is needed. In our problem the relevant 
density center maxima will be those that identify canopy and 
ground. To this end, the so-called hyperparameter concept 
will be applied. The idea of the hyperparameter concept is 
to capture those items that stand out visually [24] (see M.5 
and M.6). Finally, geostatistical classification facilitates iden- 
tification of ground and canopy centers (M.7). In summary, 
the computational algorithm developed for simulated ICESat- 
2 data analysis includes the following components: 1) calcula- 
tion of spatial density centers, taking into account anisotropy 
(implemented with help of eigenvectors); 2) analysis of the 
cumulative distribution function, filtering, and application 
of the hyperparameter method of geostatistical classification 
adapted to identify ground and canopy ranges; 3) design and 
application of density threshold functions for identification of 
canopy and ground over background Scatterers; 4) optional 
linear interpolation; and 5) several plotting options and an 
optional data output for comparative analyses and validation. A 
step-by-step description of the algorithm is given in Section IV. 

Mathematical concepts that have been developed specifi- 
cally for this algorithm, or may not be generally known, are 
explained in the following subsections. 

(M.l) Globalization-localization paradigm. 

(M.2) Radial basis function (rbf). 

(M.3) Rbf density. 

(M.4) Geometrical anisotropy. 

(M.5) Geostatistical classification parameters: slope parame- 
ter p i and significance parameter p 2 . 

(M.6) Hyperparameters. 

(M.7) Application of geostatistical classification ideas to the 
histogram of the density values. 

(M.l) Globalization-Localization Paradigm 

A new globalization-localization approach is used to 
overcome a well-known statistical sampling problem, by 
disconnecting sampling bases in different steps of the algo- 
rithm. The idea here is to treat the following problem, typical 
of statistical analysis. If the data window (in distance along the 
flight path) is too small, then not enough photons are available 
to derive sufficient statistical information to identify ground 


This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 


6 


IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 


under canopy. If the data window is too large, then ground 
and canopy may not be separable any more. The globalization- 
localization idea used here is to disconnect the two problems 
by using a large window (here, an entire dataset) to derive 
a suite of statistical parameters then in another algorithm 
step employ a local classification or detection algorithm that 
utilizes the globally derived parameters. Future ICESat-2 data 
are expected to be much larger datasets than the simulated 
data analyzed here, and hence the globalization-localization 
paradigm can be implemented using a large moving window 
in the along-track direction combined with smaller windows 
inside that. 

(M.2) Radial Basis Function 

An rbf is a real-valued function whose value depends on 
distance from a center c e V for all i in a definition area V 

<D(x,c) = <D(||x-c||) (1) 

with respect to any norm || • ||. In the algorithm, we utilize a 
Gaussian rbf (letting r = x — c and s e IZ) 

<D(r) = e m (2) 

Visualized as a surface in F ? , this rbf has the shape of (half) 
a Gaussian bell curve rotated around the location of a center 
c e IZ 2 . In the photon data analysis, we have c e IZ 3 and 
the surface is in IZ 4 . More formally, the Gaussian probability 
density function is 

1 -r^) 2 

/normpdf = , e ^ (3) 

V2/rcr 2 

with standard deviation a and mean p of the population. 
Replacing o — s and jli = 0 yields 

O (r ) = (T \Fhz /normpdf • (4) 

(See [6].) 

(M3) Density Centers 

Identification of points within tree crowns is motivated by 
the observation that a tree crown is a diffuse reflector and a 
volumetric scatterer, but points within the tree crown have a 
high probability of being located within clusters of other parts 
of the tree crown, which is a property that does not hold for 
reflections of ambient light or noise outside of the tree crowns. 
To identify points located inside clusters or clouds of points 
with higher density, the rbf concept is applied as follows. 

For the photon data analysis problem, the definition set V 
is the set of all photons (in a track or window). For each point 
c e V, a density value fd (c) is calculated by summing up rbf 
values for all neighbors within a 15-m radius 

fd(c) = Y, ®(\\x - c\\a) (5) 

xeV c 

with V c = [x e V : \\x — c || 2 < 15m} the set of all points 
within a given radius (here: 15 m) from the center point c 
[note that in this initial distance determination simply the 
2-norm (Euclid norm) || • || 2 is used]. In the rbf, we use a norm 


|| • || a that takes anisotropy into account. The value of 15 m 
is chosen because the radius of a cluster of reflectors within 
a tree crown is usually less than 15 m. This heuristic choice 
of a value is used in all analyses, but is easily changed. The 
concept of density centers is illustrated in Fig. 1(a) and (b). 

(M.4) Anisotropy Norm 

Using an anisotropy norm is motivated by the notion that 
tree canopy has a tendency to extend more in the horizontal 
direction than in the vertical direction. When the anisotropy 
norm is combined with the rbf, points found in a horizontal 
direction from the center point are weighted higher than points 
found in a vertical direction. The following algorithm imple- 
ments a matrix multiplication that is an affine transformation 
of the density function (the rbf) into a function of ellipsoidal 
shape. The anisotropy norm is defined as 

Holla = II A (») || 2 (6) 

for any vector v e IZ 3 , with a transformation matrix 

/I0°\ 

A = (0 i 0 . (7) 

\0 0 1 / 

This is applied to the density centers c and all their 
neighboring points in (6) as 

||a-c|U = ||A(a-c)|| 2 . (8) 

Points of the same rbf value 0(||x — c \\ a ) are now located on 
an ellipsoid with axes (3, 3, 1) around the center point c and 
(half) Gaussian bell curves along each radial line. The density 
value fd (c) then reflects the tendency of tree crowns to connect 
horizontally into forest canopies. In the 2-D realization of the 
simulated dataset, a transformation matrix 

< 9) 

is used. (The same anisotropy norm is used for ground, as 
ground continues more in horizontal direction. For terrain with 
a high topographic relief, the anisotropy matrix A can be set 
to a different value, or to identity.) 

(M.5-M.7) Geostatistical Classification Ideas and Their Appli- 
cation to Histogram Analysis 

Several algorithm concepts are inspired by concepts of the 
geostatistical classification method [22], [24] and modified 
to solve the lidar data analysis problem. Analysis of the 
variogram or its generalization, the vario function, is the basis 
of the geostatistical classification, but some of the principles 
transfer to any function that is affected by noise and here 
are applied to the histogram of the data and the histogram of 
density. More generally, we may consider any positive real- 
valued discrete function f(xi ), defined for values Xi, i = 1 ,n. 

The geostatistical classification proceeds by the analysis 
of sequences of minima and maxima in the vario function, 
derivation of parameters from those sequences, construction of 
a feature vector from the parameters, and classification or class 
association based on the feature vector. A related problem in 
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along-track distance [m] 

(b) density 




pseudo-elevation [m] 


(d) histogram of lidar data 



(e) filtered histogram of lidar data 


Fig. 1. Analysis concepts, (a) Simulated ICESat-2 lidar data, (b) Density 
of data in (a), derived from summation of radial basis function values for 
anisotropic neighborhood, (c) Result of analysis. Stars mark density centers 
within the two classes of ground and canopy, derived using histograms in (e), 
and lines are piecewise-linear interpolations of density centers within each 
class, (d) Histogram of lidar data elevation values shown in (a), (e) Filtered 
histogram of lidar data elevation values shown in (a). Stars mark the two 
most significant local maxima, which are determined using the hyperparameter 
concept and identify highest frequencies in ground and canopy elevations. 


signal processing is the analysis of a time series or recording 
of a time-variable signal, which is often based on the analysis 
of the minima and maxima of the signal. 


(M.5) Geo statistical Classification Parameters 

Let f(xi) be a positive real-valued discrete function defined 
for values x/, i = 1 ,n. This function may be a histogram, 
a variogram, or a vario function. We introduce classification 
parameters used in the photon classification problem. The 
mindist parameter is defined as the lag of the first minimum 
after the first maximum in the function, mindist gives the 
spacing of parallel features recorded in the function. We 
further define the significance parameters pi and p2 as 


p i 

p2 


f (Amaxi ) / (Amini ) 

•^maxi -^mini 
/ (Amaxi ) ~ f (-Tmini ) 
/ (Amaxi) 


( 10 ) 

( 11 ) 


where pi is the slope parameter and p2 the relative signifi- 
cance of the first minimum mini after the first maximum maxp 
In this notation 

mindist = x m i ni . (12) 


Parameters of types pi and p2 can be calculated for any 
max-min sequence, defining 

/(* max/ ) f (-Tmin j ) 

pt 1 (max, mm) = (13) 

i J T ma x/ — inl- 

and an analog to (11) for p2-type parameters, for i < j and 
the convention that minimum min/ always follows maximum 
max/ . Note that slope parameters involve distance but p2-type 
parameters do not. 


(M.6) Hyperparameters 

A problem typical of the analysis of complex and noisy 
processes or datasets is that the maxima and minima that tell 
the “story” of the problem can be identified visually because 
they stand out but are numerically obscured by noise or by 
other processes that may interfere with the main process of 
interest. In the lidar data analysis, we use a robust search algo- 
rithm to automatically identify “bigger” max-min sequences 
and associated generalized parameters, as described in [24]. 
We determine bigmax, the largest maximum in a group of g 
maxima, and then bigmin, the smallest minimum in a group 
of g minima following bigmax. For a fixed groupsize g , a 
sequence of bigmaxs and bigmins can be determined, and 
the selected ones are those that survive several increases of 
the group size. The optimal group size for a given problem 
can be determined automatically. Here we have applied a 
criterion to find stable group sizes such that the bigmax- 
bigmin pair stays the same for three consecutive group sizes. 
The so-determined parameters bigmax^ , bigmin^ , i = 1 ,n, 
are termed hypermaxima and hyperminima. For these selected 
hypermaxima and hyperminima, hyperparameters are defined 
as generalizations of (10), (11), and (13) 

/ (-Xbigmax;) “ / (*bigmin) 

pt 1 (bigmaxy , bigmin ) = — (14) 

■Tbigmax/ — -Tbigmiiiy 


and 

2 (bigmax; , bigmin^) 


/ (-Tbigmax/ ) / (-^bigminy ) 

/ (-Tbigmax/) 


(15) 
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(M.7) Application to Histograms of Forest Lidar Data 
and Density 

The hyperparameter concept is applied to identify the two 
main maxima in the histogram, which represent ground and 
canopy. It is a necessary piece in the analysis, because even 
after filtering, many histograms of forest lidar data have several 
maxima that may be identified as ground or canopy [see 
Fig. 1(d)]. The geostatistical classification concepts are applied 
to the histograms of elevation values and to the histograms of 
density values (see M.3 and M.4). 

IV. Algorithm Steps 


The algorithm proceeds by the following steps. 

1) Import Data: Data are recorded returns of individual 
photons, with P = (xi,X2,z) is the location of the 
reflector in three dimensions, z = z(X) = z(x 1 ,^ 2 ) 
is the elevation value of a photon in location P, and 
X = (x\,X 2 ) is the projection of the photon’s location 
onto the ground. Data are loaded into the program. 

2) Identification of Ground and Canopy Elevation Ranges 
by Histogram Analysis of Photon Elevation Data: 

a) A histogram of the elevation values of received 
photons is created and grouped by elevation bins. 
Here, we used 100 elevation bins for a total eleva- 
tion range of 100 m (bin size 1 m). In the analysis 
of simulated ICESat-2 datasets, the histograms are 
built for the entire datasets. In an application of 
the algorithm to future ICESat-2 data, the analysis 
will be carried out for moving windows along the 
satellite ground track. 

b) The following analysis is based on automated iden- 
tification of hypermaxima and hyperminima in the 
histogram (step 2c). Prior to this, high-frequency 
wiggles in the histogram are smoothed out by the 
application of a low-pass filter to the histogram 
created in step (2a) to avoid that the automated 
algorithm catches on insignificant maxima and 
minima. As described in [23], this can be achieved 
by a five-point moving average; formally 


— (% iSiQ —2 T & 2^0—1 T ^3^/0 T ^4^0+1 T - ^5^0+2 ( 16 ) 


with 


and 


Z«j = l 

7=1.5 


a\ = as and 02 = a\ 


(17) 

(18) 


and high values for the central coefficients 
a 2 , as, «4 and low values for the outer coefficients 
a\, as. A Butterworth filter with 


a = (a\, 02, as, 04, of) 

= (0.0625,0.25,0.375,0.25,0.0625) (19) 

is selected, because it satisfies the criteria in 
(16)— (1 8). A Butterworth filter is a well-known 
low-distortion low-pass filter and hence reduces 
high frequencies; when used in the spatial domain 


as in our application, it reduces the occurrence 
of insignificant maxima and minima. Good results 
have been obtained in applications of this filter 
in connection with the geostatistical classification 
method [22], [23]. Here, this filter is applied to 
histograms, with s- t the histogram value of bin i 
for i e {l, ... ,m} and m the number of bins. The 
central index of the filter is indicated by io, and 
the filtered value si 0 filt replaces the central value 
Si 0 . An example is shown in Fig. 1. 

c) In the next step, two hypermaxima are identi- 
fied (bigmax 1 and bigmax 2 ). These are the two 
maxima that stand out visually, and will repre- 
sent ground and canopy elevation centers (see 
mathematical concept hyperparameters). For the 
ground and canopy range detection problem, the 
hyperparameter location algorithm is adapted from 
that described in [24] for hyperparameters of vario 
functions. 

For the ground and canopy range detection prob- 
lem, the following algorithm is used to determine 
the hypermaxima locations by an iterative process: 
In the first iteration, group size is g\ = 1, and all 
local maxima in the histogram are identified and 
written into an index list. To go from step (n — 1) to 
step n of the iteration, the following is used: given 
a list of maxima in the index set I n - 1 , the group 
size is increased g n = g n -\ + 1 and the largest 
maximum within each group of g n maxima in the 
original list is determined and written into Index 
set If A maximum is retained in list I n if it was 
already in the previous list 

In^In-XElf ( 20 ) 

Iteration is continued until at most two maxima 
are left {rib is the index of the break point of the 
iteration) 

\In b \<2. (21) 

Noting that I nb - 1 may contain more than two max- 
ima, the two most significant maxima in I nb ~ 1 are 
selected, using a param_2-type criterion [see the 
mathematical concept of the significance parameter 
P2 in (M.5)], with the constraint that the final two 
maxima must be at least eight histogram bins apart. 
(This corresponds to 8 m in the SERC study and 
is easily changed.) 

The hypermaxima are identified in the histograms 
in Fig. 2; panel (b) demonstrates that it is neces- 
sary to determine the hypermaximum in a series 
of maxima that remain after Butterworth filter- 
ing. After application of this step, two “eleva- 
tion centers” are identified, bigmax g and bigmax c , 
with bigmaXg < bigmax c and the corresponding 
x-locations bigmax g (xg) and bigmax c (x c ). 

d) The process for determining a canopy elevation 
range and a ground elevation range, described 
in this step, is illustrated in Fig. 2(c) and (d). 
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Fig. 2. Histogram analysis, (a) Ideal situation with strong and single maximum for ground (highest) and canopy (second highest), (b) Case where Butterworth 
filter smoothes out outlying maxima, (c) Case where bigmin criterion is used for canopy-range determination, (d) Case where paraml (slope) criterion is used 
for canopy-range determination. Note: Color bars are plotted in the order they are calculated; earlier lines may be hidden. Color coding is as follows: green: 
mirror around the selected (starred) maximum; red: parameter bigmin; yellow: parameter paraml (slope); black: compared; magenta: limit used. 



density value 

(a) histogram of density values 



(b) filtered histogram of density values 

Fig. 3. Threshold analysis, demonstrated for ground detection. The threshold 
used is the bin associated with 0.5 of the histogram value of the bigmax of 
density of the ground range dataset. In this example, 0.5 x 72 = 36 for the 
histogram values, the ground threshold then becomes 20. The noise threshold 
is the bin associated with the bigmax, which is 9 in this case. 

Colored lines are used for illustration. First, the 
minimum z m in(xo) between the ground and canopy 
centers bigmax g and bigmax c is determined. Then 


the minimum is mirrored around the ground and 
canopy center locations, as v gre en g = x g — (xo — x g ) 
and Xg r een c = x c + (. x c — xo). The green lines are 
placeholders for finding the range values. Three 
local minima closest to the green lines are iden- 
tified in lo, the one with the lowest minimum is 
termed Zredfeed) (this is a hyperminimum) and the 
one with the steepest slope to the associated hyper- 
maximum is termed Zyeilow (^yellow) (this utilizes 
a pi -type criterion). Finally, the range limits are 
determined using the slope values from the “red” 
and “yellow” points to the hypermaxima 

Z final (-Tfinal) = ^red(-Tred) (22) 

if 

0.8 ptl (bigmax c , Zyellow (-Tyellow)) 

< ptl (bigmax, Zred(*red)) 

< 1 .2pf 1 (bigmax c , Zyellow(-Tyellow)) (23) 

and 

£ final C^final) = ^yellow (-^yellow) (24) 

otherwise. The elevation range for ground is deter- 
mined analogously. 

3) Segmentation of the Dataset Into Ground and Canopy 
Range Sets: The ground and canopy elevation ranges 
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determined in step 2 are applied to segment the photon 
dataset into a canopy range set and a ground range set, 
and a rest class (elevations higher than canopy range 
or lower than ground range). It is worth emphasizing 
that the ground and canopy range sets are not a classi- 
fication of photons into ground and canopy returns but 
a segmentation of the global dataset into sets in which 
ground and canopy can be found. 

The next analysis steps are then carried out separately 
for the ground range set and the canopy range set. 
Globalization-localization (M.l). Note that the segmen- 
tation algorithm can be applied in a window. For the 
SERC data, the algorithm steps 1-3 have been applied 
globally. The following steps (4-9) are applied in a 
localization. This allows us to use the properties of a 
larger window, or the whole dataset, for a first identi- 
fication of elevation points in a likely range, based on 
the histogram analysis. Then in the second part of the 
algorithm, different mathematical concepts are applied 
to identify points that are ground and canopy reflectors. 

4) Apply Density Function for Canopy Center Identifica- 
tion: Density values fd (c) are calculated as described in 
M.3, using the rbf (5) for all points in a 15-m radius. For 
the function evaluation, the distance values transformed 
according to the anisotropy norm described in M.4 are 
employed. The sum of all rbf values of all neighbors 
of a point is called the (rbf-)density of that point. 

5) Histogram of Photon Density in the Canopy and Ground 
Region: A histogram H(d) of the density values d 
(Step 4) for photon events in the canopy region is 
calculated in 100 evenly spaced bins and filtered using 
the Butterworth filter with the same values a = 
(0.0625,0.25,0.375,0.25,0.0625) as in step 2b. The 
maximal histogram value is identified as H max (d m ) 
where d m is the density value for which the maximum 
occurs. 

Then a canopy threshold d c is set. 

Let H c = 0.8 // max and determine d c as the density 
value with H(d c ) = H c and d c — d m . 

For ground threshold d g , a factor of 0.5 is used. 
Let H g = 0.5 // m ax, where d g is the density value 
with H(d g ) = H g and d g > d m . Note that a lower 
percentage of the histogram’s maximum results in a 
higher threshold. Fig. 3 illustrates this step for ground 
detection. 

6) Apply the Noise Filter: The density value d m for which 
the largest density count occurs (as defined in step 5) 
is used as a noise threshold, and points with density 
less than d m are rejected. The noise threshold uses a 
linear function to separate signal area densities and noise 
area densities. Noise area densities are automatically 
determined as the lowest uniform density (the areas 
where these occur do not need to be identified.) 

7) Recompute the Density Function: To eliminate possible 
high-density noise clusters, the density function (5) is 
applied a second time, as described above (including 
anisotropy norm). A high-density point with only noise- 
type density neighbors will be reassigned a much lower 


density value in this second run of density, compared to 
the first run. 

We write d x for the density value of a point (z(x), x) 
after the second run of the rbf fd . 

8) “ Build line. ” Canopy Class Association ( More Clearly, 
Define the Set of Discrete Points That Are in the Canopy 
Class): A point (z(x),x) is identified as a member of 
the canopy set if all the following hold: 

a) d x > d c (and d x > d g for ground); 

b) (z(x),x) is the point with maximal density in a 
10 m along-track interval; and 

c) a rigidity criterion is satisfied. 

The rigidity criterion fixes a maximal elevation differ- 
ence that is likely to occur among photons reflected from 
the same tree or neighboring trees, as \z(xi)—z(xi-\)\ < 
rig for z(x) elevation in location x. 

The rigidity condition may be adjusted to match forest 
types; for instance, mapping needle trees in sparse stands 
may require a higher rigidity number than leaf trees 
in dense forests. The rigidity condition can be relaxed 
entirely. 

9) Ground Detection: To detect ground under canopy and 
associate discrete photon points to the ground class, 
steps 4-8 are repeated using the ground parameters. 
Canopy and ground lines are illustrated in Fig. 1(c). 

V. Results 

In this section, we analyze under which conditions the 
design cases for the beams (see Section II) can be expected to 
yield useful data for observation of ground and canopy levels 
in forests. We present results of several case studies, selected 
from a total of 99 test cases of simulated data. All cases are 
analyzed with the same algorithm. 

In the first case study, typical cases of the medium-strong 
beam, labeled msp9 (short: p9) are investigated (Fig. 4). 
This figure demonstrates that the algorithm works for the 
medium- strong beam, the two options of resampling (without 
(rO) and with (rl) resampling), and increasing noise levels. 
The plots show the simulated data in the top panel and 
the interpretation of ground and canopy by the detection 
algorithm. Points that are original signal points in SERC 
forest observations are colored red, while noise points resulting 
from the simulation are shown in black. The signal-to-noise 
information was not used in the algorithm, but it aids in 
visually assessing validity of the algorithm. Information on 
ground versus canopy or reflections of other items (birds, rocks 
or other features, atmospheric reflections) are not provided. In 
this section, visual validation is used; statistical validation is 
given in Section VI. 

In all cases, the level of the canopy is well detected by the 
algorithm. The canopy assumes similar shapes in all cases in 
spite of increasing noise and two different sampling strategies. 
The resampling flag “rl” indicates that resampling is allowed, 
which increases the signal-to-noise ratio. Cases labeled “rO” 
(no resampling) constitute a weaker signal, given the same 
noise level (left column of figure panels, (a), (c), and (e). At 
the start of the window, no canopy data are identified, and this 
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Fig. 4. Data and ground/canopy detection for the stronger beam (msp9), without and with resampling (rO, rl) (columns) and increasing noise levels (uz2, 
uz3, uz5) (rows). All examples are SERC forest data, Section I, simulation 2. 


matches the visual impression. In the case studies, an entire 
flight segment of 2500 m is analyzed. For actual satellite or 
aerial observation datasets, a moving window algorithm will 
be implemented, which will eliminate edge effects that occur 
at the start and the end of the shorter segments analyzed here. 

Detection of ground level under canopy also works well, 
but the number of points identified as canopy shows some 


variability. The software includes a simple piecewise-linear 
interpolation option that allows continuation of ground level 
across large gaps [over 400 m in 4(a), over 600 m in 4(b)]. 
Even in the worst case of combination of no resampling of 
beams and the highest noise levels, the detection of ground 
and canopy succeeds. Since the ground and canopy detection 
works for both resampling options, science or engineering 
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TABLE I 

Summary of Validation of Ground Set (Top) and Canopy Set (Bottom). Correctly Identified Photons 


Ground 


Case 


uz2 



uz3 



uz5 



uzAll 



Mean 

Median 

% 

Mean 

Median 

% 

Mean 

Median 

% 

Mean 

Median 

% 

p9, rO 

0.45 

0.00 

97.20 

0.49 

0.00 

95.78 

0.55 

0.00 

94.70 

0.49 

0.00 

95.89 

p9, rl 

0.20 

0.00 

99.47 

0.24 

0.00 

99.22 

0.33 

0.00 

98.81 

0.26 

0.00 

99.17 

p9, rAll 

0.33 

0.00 

98.33 

0.36 

0.00 

97.50 

0.44 

0.00 

96.75 

0.38 

0.00 

97.53 

p4, rO 

0.89 

0.00 

90.25 

0.93 

0.00 

89.79 

0.82 

0.00 

85.28 

0.88 

0.00 

88.44 

p4, rl 

0.31 

0.00 

98.51 

0.38 

0.00 

98.48 

0.38 

0.00 

98.04 

0.36 

0.00 

98.34 

p4, rAll 

0.64 

0.00 

93.79 

0.70 

0.00 

93.52 

0.63 

0.00 

90.75 

0.66 

0.00 

92.68 


Canopy 


Case 


uz2 



uz3 



uz5 



uzAll 



Mean 

Median 

% 

Mean 

Median 

% 

Mean 

Median 

% 

Mean 

Median 

% 

p9, rO 

0.38 

0.00 

96.00 

0.57 

0.00 

93.70 

0.83 

0.00 

93.01 

0.59 

0.00 

94.23 

p9, rl 

0.18 

0.00 

99.57 

0.30 

0.00 

99.04 

0.37 

0.00 

98.95 

0.28 

0.00 

99.19 

p9, rAll 

0.28 

0.00 

97.78 

0.43 

0.00 

96.37 

0.60 

0.00 

95.98 

0.44 

0.00 

96.71 

p4, rO 

0.94 

0.00 

85.92 

1.29 

0.13 

82.01 

2.26 

1.20 

72.85 

1.50 

0.44 

80.26 

p4, rl 

0.25 

0.00 

98.68 

0.35 

0.00 

98.06 

0.53 

0.00 

97.46 

0.38 

0.00 

98.07 

p4, rAll 

0.65 

0.00 

91.39 

0.89 

0.07 

88.89 

1.52 

0.69 

83.40 

1.02 

0.25 

87.89 


Mean: mean distance in meters from a point identified as ground/canopy to the nearest signal point in the validation set. Median: 
median distance in meters from a point identified as ground/canopy to the nearest signal point in the validation set. %: percent of 
points identified as ground/canopy points that are also in the validation set of signal points. Values are for groups of datasets. p9: 
all sets associated with medium-strong beams; p4: all sets associated with weak beams. rO, rl: all sets with resampling rO, rl resp. 
rAll: all resampling options; uz2,uz3,uz5: low, medium, and high noise levels reps; uzAll: all noise levels. 


criteria can be employed for deciding between the two resam- 
pling options. 

To investigate whether it may be possible to utilize the 
weakest beams in the ICESat-2 sensor panel and still expect 
to detect ground under canopy in observations of forests, a 
case study for the weakest beams (“msp4”) is conducted, 
using the same algorithm as for stronger beams, without 
tuning any parameters. Fig. 5 illustrates how the algorithm 
performs in the worst cases of the weakest beam (msp4) and 
no resampling (rO), increasing noise. Notably, the algorithm 
functions for all three noise levels, and automated detection 
exceeds the possibilities of visual detection of ground and 
canopy. In comparison to results in Fig. 4, ground can still 
be detected in sufficiently many locations to derive ground 
level, but there is a tendency for noise clusters to be misiden- 
tified as ground. Canopy heights agree in general among 
the three cases. Quantifications of these statements will be 
given in Section VI on validation. The results are encour- 
aging to include the weakest beams in the instrument panel 
for ICESat-2. 

Introduction of a flexible tracking rigidity parameter serves 
two purposes: 1) options in detection of canopy and ground 
for weak beams in case of weakly nonstationary ground and 
canopy levels and 2) a possibility to match forest-type specific 
characteristics. To explain the application option 1, tracking 
rigidity can be employed to improve detection of ground and 
canopy in situations of weak beams and most noise. To give an 
example of application 2, a forest with wide- standing conifers 
or pines may result in lidar data that show individual trees, 
hence a large slope outlining tree shape may be appropriate, 
and consequently a high rigidity parameter will be helpful. 
A forest with a dense leaf-tree canopy of homogeneous age 
typically has a narrow range of crown-top elevations, which is 
better detected by a lower rigidity parameter. Fig. 6(a) and (b) 


illustrate the effect of using two different rigidity parameter 
values for analysis of the same dataset. In the example, canopy 
height curves in the lower example (with higher rigidity) have 
a tendency to pick up points that may be located between tree 
crowns. The effect of showing individual trees is expected to 
be more pronounced in analysis of data from wide- standing 
forests (than in the SERC data, where dense canopies prevail). 
Fig. 7 shows that the rigidity parameter can be employed to 
improve ground and canopy detection for the weakest beam 
(msp4) combined with the no-resampling option (rO), which 
effectively yields fewer signal photons and the highest noise 
level (uz5) (see Fig. 5). 

VI. Validation 

To facilitate algorithm validation, the original signal points 
are flagged in the simulated ICESat-2-type datasets. (The 
simulated data contain a column with a 0-1 flag, the value 1 
is given for each original data point.) This information was 
not used in the detection and classification algorithm and can 
therefore serve for validation of the algorithm. Results of the 
validation are given in Table I for the following statistical 
parameters, calculated separately for ground and canopy: 
1) percentage of points selected that are signal points; 2) and 
3) distance in meters from a point that has been identified as a 
signal point to the nearest point that is a signal point, given as 
mean and median of nearest-neighbor distances in 3-D space. 
Note that the distances in 2, 3 are not elevation errors. Results 
listed in Table I are summarized from results obtained for 
all 99 datasets, so that performance for weak beams (msp4), 
medium- strength beams (msp9), resampling options, and the 
three noise levels can be analyzed. 

For ground, the percentage of correctly selected points 
is 94.7%-99.47% for all groups of medium- strength (msp9, 
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Fig. 6. Experiments using tracking rigidity. Same algorithm, except with 
lower rigidity parameter (top panel) and higher rigidity parameter (lower 
panel), applied to the same dataset (msp p4, resampling rO, uz2, vll). 
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Fig. 5. Data and ground/canopy detection for the weak beam (msp4), without 
resampling (rO) and increasing noise levels (uz2, uz3, uz5). All examples are 
SERC forest data, Section I, simulation 1. 


Fig. 7. Application of tracking rigidity to improve detection for weak beams 
and most noise. Data and ground/canopy detection, msp p4, resampling rO, 
uz5, vll. 


short: p9) beams and 85%-98.81% for all groups of weak 
(msp4, short: p4) beams. The average value over all data sets 
in a group is 95.89% for (p9, r 0) for any noise level, 99.17% 
for (p9, r 1), and 95.53% for all medium- strength (p9) cases. 


The average value over all data sets in a group is 88.44% 
for weak beams and no resampling (p4, rO) for any noise 
level, 98.34% for weak beams and resampling (p4,rl) and 
92.68% for all cases of weak beams (p4). The median distance 
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from a point in the selected set to the nearest neighbor in 
the signal points set is always zero, and the mean distance is 
0.20-0.55 m; the resampling option has a stronger effect than 
the noise level. The validation demonstrates that the algorithm 
works very well for detection of ground under canopy. The 
elevation error is a lot smaller than the distance numbers, but 
has not been calculated directly, because the piecewise linear 
interpolation is only included for visualization of the ground 
and canopy lines, and the objective of the paper is to design 
a ground detection algorithm, not an interpolation algorithm. 

For ground, the percentage of correctly selected points is 
93.01 %-99. 57% for all groups of medium strength (msp9, 
short: p9) beams and 72.85%-98.68% for all groups of 
weak (msp4, short: p4) beams. The average value over all 
datasets in a group is 94.23% for medium- strength beams 
and no resampling (p9, rO) for any noise level; 99.19% for 
medium- strength beams and resampling (p9, r 1); and 96.71% 
for all medium- strength cases (p9). The average value over 
all datasets in a group is 80.26% for weak beams and no 
resampling (p4, rO) for any noise level; 98.07% for weak 
beams and resampling (p4, rl); and 87.89% for all cases of 
weak beams (p4). The median distance from a point in the 
selected set to the nearest neighbor in the signal points set is 
zero for all cases of medium- strength beams (p9); for all cases 
of weak beams and lowest nighttime noise levels (p4, uz2 ), 
it is 0.25 m for the average of all cases of weak beams (p4). 
The mean distance is 0.44 m for the average of all cases of 
medium-strong beams ( p9 ), the lowest for (p9, rl,uz2) at 
0.18 m, and the highest for (j?9, rO, uz5 ) at 0.83 m. The mean 
distance is 1.02 m for the average of all cases of weak beams 
(p4), the lowest for (p4, rl, uz2) at 0.25 m and the highest 
for (p4, rO), uz5 at 2.26 m. In all cases, the resampling option 
has a stronger effect on the accuracy than the noise level. This 
is a good result, because the resampling option can be set in 
the instrument-level detection algorithm, whereas noise from 
ambient light and atmospheric conditions is an environmental 
constraint that is corrected for in the data analysis. 

The results of the detection algorithm are also very good 
for canopy detection, with similarly good values as the results 
for the ground detection. The detection of the canopy is 
complicated by the facts that the canopy is fuzzy and that the 
sparse canopy returns have to be extracted from many noise 
points. These facts may explain a lack of accuracy when only 
few returns are recorded, as in the case of the weakest beam 
(msp4). 

Even in this hardest case, the average distance from a point 
in the selected set to the nearest neighbor in the signal points 
set is 1.02 m, which indicates that the weak beam may be 
expected to yield useable measurements. 

At this point, it should be recalled that the analysis includes 
the two weaker types of beams considered for ICESat-2, 
namely, the weak beam and the medium- strength beam, and 
not the strong beam. 

The dataset provided does not identify a ground dataset 
and a canopy dataset, hence the classification part of the 
algorithm cannot be validated numerically. Visual inspection 
of the results indicates that the canopy-class signal points fall 
in the upper layer and the ground-class identified points fall in 


the lower layer, and the continuity of the layers indicates that 
the classification works correctly. As component of the experi- 
mental part of the prelaunch phase ICESat-2 project, validation 
datasets and instrument test datasets will be collected. To 
complement future flights with the airborne Multiple Altimeter 
Beam Experimental Lidar (MABEL), which is a high-altitude 
(20 km) photon-counting multibeam sensor, validation flights 
with vegetation lidar s of known performance are planned. 

VII. Summary, Discussion, Outlook, and 
Conclusion 

In this paper, a set of algorithms has been developed and 
validated, which allows the detection of ground under dense 
canopy and identification of ground and canopy levels in sim- 
ulated ICESat-2-type data. These data constitute a new type of 
spaceborne lidar altimeter data that will be collected during the 
ICESat-2 mission with a next-generation multibeam photon- 
counting lidar altimeter. Data analyzed in this paper are based 
on airborne observations with a Sigma Space Corporation 
photon-counting lidar, and simulations vary with respect to 
the signal strength, noise levels, photon sampling options, and 
other properties. To consider the mathematically most difficult 
cases, data stem from dense forests observed during leaf- 
on conditions and the cases of the two weaker beam types 
were analyzed. These are: 1) a medium- strength beam with 
an expected return of 0.93 mean signals per shot (msp9) and 
2) a weak beam with 0.48 msp (msp4). The third case is a 
strong beam with 1.93 msp; this will be used in the ICESat-2 
instrument design in any case. The medium- strength beam 
(msp9) corresponds to the weaker beam in a six-beam design 
proposed sensor for ICESat-2, whereas an alternative proposed 
nine-beam design for an ICESat-2 sensor includes four corner 
beams of strength msp4, four middle beams of strength msp9, 
and a center beam with a signal rate of 1.93 msp. 

A mathematical algorithm was developed using an approach 
that combines spatial statistical and discrete mathemati- 
cal concepts, including rbfs, density measures, geometrical 
anisotropy, eigenvectors and geostatistical classification para- 
meters and hyperparameters. Piecewise linear interpolation 
was provided as an option to bridge between identified ground 
points and analogously, canopy centers. The software allows 
flexibility with respect to output types, which include graphics 
options and data output for validation and canopy height/ 
ground elevation determination. 

Validation using 99 simulated datasets showed that the 
algorithm works very well and that ground and canopy 
elevation, and hence canopy height, can be expected to be 
observable with a high accuracy during the ICESat-2 mission. 
A result relevant for instrument design is that even the two 
weaker beam classes considered can be expected to yield 
useful results for vegetation measurements (93.01 %-99. 57% 
correctly selected points for a beam with expected return of 
0.93 mean signals per shot [msp9] and 72.85%-98.68% for 
0.48 msp [msp4]. The median distance from a point in the 
selected set to the nearest neighbor in the signal points set 
is zero for all msp9 cases and for low-noise msp4 cases, and 
0.25 m for the average of all msp4 cases. The mean distance 
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is 0.44 m for the average of all msp9 cases and 1.02 m for the 
average of all msp4 cases. Notably, this is a 3-D distance error 
and not an elevation error; the expected elevation error average 
is lower. In all cases, the option of resampling versus using 
each detected photon exactly once has a stronger effect on the 
accuracy than the noise level. Following our analysis, ground 
and canopy detection, and hence determination of canopy 
height, is possible in all noise conditions. The resampling 
option can be set in the instrument-level detection algorithm. 

As of Oct 2012, the ATLAS lidar under development for 
ICESat-2 will have a nominal 41.3-m (83.3-// R) diameter field 
of view where about 85% of the returns will be expected 
from the central 12.4 m (25 juR ) containing the transmitter 
beam footprint (A. Martino, Instrument-Science-Martino.pptx, 
presented at ICESat-2 Science Definition Team Meeting). To 
ascertain launch readiness of the mission, signal detection and 
data analysis algorithms need to be developed at the same time 
as the sensor is being developed. The algorithm used here for 
data simulation based on airborne Sigma Space Corporation 
data employs a sampling from a combination of a 10-m 
diameter centered on flight track segments and a 1-m diameter 
cap size, which yields a 12-m possible diameter for photon 
selection and a 10-m resultant simulated footprint, compared 
to the 12.4-m central field of view for the Atlas instrument. It 
is important to notice that, while some parameters best used 
in ICESat-2 data simulations may change as the ICESat-2 
instrument is being refined, the mathematical principles of the 
surface, canopy, and ground detection algorithm described here 
remain valid. The MABEL is being developed and further 
improved as an airborne predecessor whose data will more 
closely resemble the expected ICESat-2 data. 

The analysis presented here does not consider the impact 
of clouds on the detection of ground and canopy, because the 
dataset is derived from airborne lidar data collected below 
cloud levels. In consequence, the results regarding expected 
retrieval of forest canopy and ground under canopy are valid 
for cloud-free atmospheric conditions only. Atmospheric con- 
ditions in the layer above the forest have been quantified by 
different noise levels within the layer, which is represented by 
the simulated data, the noise levels corresponding to nighttime 
conditions as well as clear winter and hazy summer daytime 
conditions at mid-latitude. Cloud impact on surface altimetry 
from a spaceborne 532-nm micropulse photon counting lidar is 
investigated theoretically in [48]. The effect of clouds on deter- 
mination of surface and canopy elevations can be studied with 
MABEL observations made from NASA’s high-altitude air- 
craft ER-2, which can fly at 20000 m above most cloud layers. 

The multibeam capacity of the ATLAS instrument for 
ICESat-2 has not been investigated here. Following the col- 
lection of multibeam data with MABEL, this aspect of the 
ICESat-2 mission can be analyzed. Spatial variability in 
surface reflectance, other than as recorded by the Sigma Space 
instrument, has also not been considered. MABEL data also 
have a response to surface reflectance, which is closer to that 
expected for the ATLAS instrument, but were not yet collected 
at time of this analysis. 

Because detection of ground and canopy in forested areas 
presents a technically and mathematically harder problem than 


detection of the surface in data collected over land ice and sea 
ice and most other land surfaces, the algorithm presented here 
can be expected to be applicable also for land ice, sea ice, 
as well as land surface detection and elevation determination. 
As tree canopy may be considered a diffuse reflector, the 
algorithm may be generalized for other complex and dif- 
fuse reflectors, such as rough ice surfaces and atmospheric 
reflectors such as clouds and blowing snow. In summary, the 
algorithm derived here can be used as a basis for an algorithm 
for the analysis of data from the ICESat-2 mission, data from 
the mission’s airborne precursor instrument, MABEL, and for 
analysis of photon-counting lidar altimeter data in general. 
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